Search CORE

6 research outputs found

Node Classification in Uncertain Graphs

Author: Aggarwal Charu
Dallachiesa Michele
Palpanas Themis
Publication venue
Publication date: 01/01/2014
Field of study

In many real applications that use and analyze networked data, the links in the network graph may be erroneous, or derived from probabilistic techniques. In such cases, the node classification problem can be challenging, since the unreliability of the links may affect the final results of the classification process. If the information about link reliability is not used explicitly, the classification accuracy in the underlying network may be affected adversely. In this paper, we focus on situations that require the analysis of the uncertainty that is present in the graph structure. We study the novel problem of node classification in uncertain graphs, by treating uncertainty as a first-class citizen. We propose two techniques based on a Bayes model and automatic parameter selection, and show that the incorporation of uncertainty in the classification process as a first-class citizen is beneficial. We experimentally evaluate the proposed approach using different real data sets, and study the behavior of the algorithms under different conditions. The results demonstrate the effectiveness and efficiency of our approach

arXiv.org e-Print Archive

CiteSeerX

Crossref

Modeling and Querying Data Series and Data Streams with Uncertainty

Author: Dallachiesa Michele
Publication venue: University of Trento
Publication date: 11/04/2014
Field of study

Many real applications consume data that is intrinsically uncertain and error-prone. An uncertain data series is a series whose point values are uncertain. An uncertain data stream is a data stream whose tuples are existentially uncertain and/or have an uncertain value. Typical sources of uncertainty in data series and data streams include sensor data, data synopses, privacy-preserving transformations and forecasting models. In this thesis, we focus on the following three problems: (1) the formulation and the evaluation of similarity search queries in uncertain data series; (2) the evaluation of nearest neighbor search queries in uncertain data series; (3) the adaptation of sliding windows in uncertain data stream processing to accommodate existential and value uncertainty. We demonstrate experimentally that the correlation among neighboring time-stamps in data series can be leveraged to increase the accuracy of the results. We further show that the "possible world" semantics can be used as underlying uncertainty model to formulate nearest neighbor queries that can be evaluated efficiently. Finally, we discuss the relation between existential and value uncertainty in data stream applications, and verify experimentally our proposal of uncertain sliding windows

Unitn-eprints PhD

Sliding windows over uncertain data streams

Author: Buğra Gedik
C Jin
CC Aggarwal
Gabriela Jacques-Silva
J Halpern
Kun-Lung Wu
L Getoor
L Liao
L Liao
M Dallachiesa
M Dallachiesa
Michele Dallachiesa
R Cheng
Themis Palpanas
TT Tran
W Kuo
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

NADEEF: a commodity data cleaning system

Author: Dallachiesa Michele
Ebaid Amr
Eldawy Ahmed
Elmagarmid Ahmed
Ilyas Ihab
Ouzzani Mourad
Tang Nan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Despite the increasing importance of data quality and the rich theoretical and practical contributions in all aspects of data cleaning, there is no single end-to-end off-the-shelf solution to (semi-)automate the detection and the repairing of violations w.r.t. a set of heterogeneous and ad-hoc quality constraints. In short, there is no commodity platform similar to general purpose DBMSs that can be easily customized and deployed to solve application-specific data quality problems. In this paper, we present NADEEF, an extensible, generalized and easy-to-deploy data cleaning platform. NADEEF distinguishes between a programming interface and a core to achieve generality and extensibility. The programming interface allows the users to specify multiple types of data quality rules, which uniformly define what is wrong with the data and (possibly) how to repair it through writing code that implements predefined classes. We show that the programming interface can be used to express many types of data quality rules beyond the well known CFDs (FDs), MDs and ETL rules. Treating user implemented interfaces as black-boxes, the core provides algorithms to detect errors and to clean data. The core is designed in a way to allow cleaning algorithms to cope with multiple rules holistically, i.e. detecting and repairing data errors without differentiating between various types of rules. We showcase two implementations for core repairing algorithms. These two implementations demonstrate the extensibility of our core, which can also be replaced by other user-provided algorithms. Using real-life data, we experimentally verify the generality, extensibility, and effectiveness of our system

CiteSeerX

Purdue E-Pubs

NADEEF

Author: Bohannon P.
Chu X.
Dallachiesa Michele
Fan W.
Fan W.
Raman V.
Swartz N.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref